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Remarks 

Applicants' representative thanks the Examiner for the courtesies extended during 
the telephonic conference on March 11, 2008, with Francis Dunn. During the 
conference, there was discussion regarding the rejection of the subject claims under 35 
U.S.C. § 101, 35 U.S.C. § 112, and 35 U.S.C. § 102. There also was discussion 
regarding "training algorithm" as well as discussion regarding "cost". 

Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60 and 62-64 are currently pending in the 
subject application and are presently under consideration. Claims 1, 3-25, 27-30, 32-42, 
44-48, 51-57, 59, and 62-64 have been amended as shown on pages 2-25 of the Reply. 
No new matter has been added. 

Favorable reconsideration of the subject patent application is respectfully 
requested in view of the comments and amendments herein. 

I. Rejection of Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60 and 62-64 Under 35 
U.S.C. $ 101 

Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60, and 62-64 stand rejected under 35 
U.S.C. § 101 on the grounds that the claimed invention is directed to non-statutory 
subject matter. Withdrawal of this rejection is requested for at least the following 
reasons. Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60, and 62-64 produce a useful, 
concrete and tangible result and are directed to statutory subject matter in accordance 
with 35 U.S.C. § 101. 

Because the claimed process applies the Boolean principle 
[abstract idea] to produce a useful, concrete, tangible result ... on 
its face the claimed process comfortably falls within the scope of 
§101. AT&T Corp. v. Excel Communications, Inc., 172 F.3d 1352, 
1358. (Fed. Cir. 1999) (Emphasis added); See State Street Bank & 
Trust Co. v. Signature Fin. Group, Inc., 149 F.3d 1368, 1373, 47 
USPQ2d 1596, 1601 (Fed.Cir.1998). The inquiry into patentability 
requires an examination of the contested claims to see if the 
claimed subject matter, as a whole, is a disembodied mathematical 
concept representing nothing more than a "law of nature" or an 
"abstract idea," or if the mathematical concept has been reduced to 
some practical application rendering it "useful." AT&T at 1357 
citing In re Alappat, 33 F.3d 1526, 31 1544, 31 U.S.P.Q.2D (BNA) 
1545, 1557 (Fed. Cir. 1994) (emphasis added). 
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The claimed subject matter relates generally to systems and methods that facilitate 
building a model to characterize data based on an appropriately sized subset of a 
computer readable data set. In particular, independent claim 1, as amended, recites: [a] 
computer implemented system that facilitates building a statistical model for a 
computer readable data set, comprising: a first training method that efficiently builds a 
rough statistical model from a subset of the computer readable data set capable of 
statistical characterization; an evaluation component that evaluates the rough statistical 
model to determine whether the subset of the computer readable data set is an 
appropriate subset to be utilized to build a refined statistical model for the computer 
readable data set based at least in part on stopping criterion to facilitate reducing cost 
of clustering data relative to the computer readable data set; a second training method 
that builds the refined statistical model for the computer readable data set from the 
subset if the subset is deemed appropriate by the evaluation component, the refined 
statistical model provides a more accurate modeling of the subset than the rough 
statistical model and facilitates determining good clustering of data for a fixed number 
of clusters based at least in part on predefined accuracy criteria to facilitate clustering 
of data relative to the computer readable data set, wherein the clustered data is 
provided; and a data scheduler that, based at least in part on a data policy, adaptively 
controls the size of subsets for which the first training method is applied to facilitate 
building the refined statistical model. 

The claimed subject matter recites features and/or functionality that can facilitate 
constructing a refined statistical model from statistically characterizable data associated 
with a set of data {e.g., computer readable data set) in a computationally economic and 
time-efficient manner, and utilizes the refined statistical model to facilitate determining 
clusters of data relative to data set based in part on predefined accuracy criteria. The 
clustered data can be provided as an output. Clustering a set of data is a useful, concrete 
and tangible result, as clustering data can be useful is gaining knowledge regarding the 
set of data, for instance, gaining knowledge regarding common relationships or traits 
among the respective pieces of data in the data set. The claimed subject matter also can 
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be useful with regard to data mining, including clusterization of data obtained through 
data mining. 

In a manner similar to independent claim 1, independent claims 19, 30, 42, 44, 53, 
54, and 62-64 each contain subject matter that can facilitate building statistical models to 
facilitate clustering data relative to a set of data (e.g., computer readable data set) in a 
cost-efficient manner, wherein the clustered data can be provided. Independent claim 19, 
as amended, recites: [a] computer implemented system programmed to facilitate 
building a statistical model, comprising: a first parameter estimation protocol that 
efficiently builds a rough statistical model from a subset of a computer readable data set 
. . . ,an evaluation component that determines whether the subset of data from which 
the rough statistical model was built is an acceptable size for building the statistical 
model to characterize the data set, the evaluation component utilizes a stopping 
criterion that is functionally related to an expected incremental benefit and an expected 
incremental cost associated with increasing the size of the subset of data to facilitate 
determining whether the rough statistical model is an acceptable size and to facilitate 
reducing cost of clustering data relative to the computer readable data set, and a 
second parameter estimation protocol that builds a refined statistical model for the data 
set from the subset if determined to have the acceptable size, . . the refined statistical 
model employed to identify clusters of data within the computer readable data set to 
facilitate clustering data relative to the computer readable data set, wherein the 
clustered data is provided. 

Independent claim 30, as amended, recites: [a] computer implemented learning 
curve method to facilitate building a statistical model, comprising: . . . employing a first 
training method to build a rough statistical model to characterize the subset; evaluating 
the rough statistical model for acceptability, if the rough statistical model is 
unacceptable, repeatedly increasing the size of the subset of data to provide an aggregate 
data set, building another rough statistical model to characterize the aggregate subset, and 
reevaluating the other rough statistical model, the acceptability of each rough statistical 
model based at least in part on a stopping criterion functionally related to an expected 
incremental benefit and an expected incremental cost associated with increasing the size 
of the aggregate subset in order to facilitate reducing cost associated with clustering 
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data relative to the computer readable data set, and if the rough statistical model is 
acceptable, employing a second training method to build a refined statistical model 
based at least in part on the aggregate data set, the second training method being different 
from the first training method, the refined statistical model identifies data clusters 
contained in the computer readable data set to facilitate clustering of data relative to 
the computer readable data set, wherein the clustered data is provided. 

Independent claim 42, as amended, recites: [a] computer-readable medium 
having computer-executable instructions for: . . . building a rough statistical model to 
characterize the subset based at least in part on an associated training policy; evaluating 
the rough statistical model for acceptability, if the rough statistical model is 
unacceptable, repeatedly increasing the size of the subset of data to provide an aggregate 
data set, building a rough statistical model to characterize the aggregate subset based at 
least in part on an associated training policy, and reevaluating the rough statistical model; 
building a refined statistical model for the computer readable data set from the 
aggregate data set if the rough statistical model is determined to be acceptable based at 
least in part on an associated training policy that includes determining acceptability 
based at least in part on an expected incremental benefit relative to an expected 
incremental cost associated with increasing the size of the aggregate data set in order 
to facilitate reducing cost associated with clustering data relative to the computer 
readable data set, the refined statistical model more accurately characterizes the 
aggregate data set; and utilizing the refined statistical model to identify identifiable 
clusters in the computer readable data set to facilitate clustering data relative to the 
computer readable data set, wherein the clustered data is provided. 

Independent claim 44, as amended, recites: [a] computer implemented method to 
facilitate constructing a statistical model, comprising: . . . determining a data subset 
from the training data set by estimating statistical model parameters according to a 
first training policy and evaluating the estimated statistical model parameters relative 
to the holdout data set and repeating the estimation and evaluation of statistical model 
parameters with a larger subset of the training data set until an acceptable quality of the 
estimated statistical model is established to facilitate reducing cost associated with 
characterizing clusters relative to the computer readable data; controlling parameter 
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initialization employed in each estimation of statistical model parameters repeatedly until 
an acceptable size for the determined data subset is achieved; and subsequent to 
establishing the acceptable quality of the estimated statistical model, using the 
determined data subset to improve the estimated statistical model parameters by 
employing a second training policy that is more accurate than the first training policy, 
the estimated model parameters obtained from employment of the second training policy 
utilized to characterize at least one cluster within the computer readable data to 
facilitate clustering data relative to the computer readable data, wherein the clustered 
data is provided. 

Independent claim 53, as amended, recites: [a] computer-readable medium 
having computer-executable instructions for: . . . determining a data subset from the 
training data set by estimating model parameters and controlling model parameter 
initialization according to a first training policy and evaluating the estimated model 
parameters relative to the holdout data set and repeating the estimation, initialization, and 
evaluation of model parameters with a next successively larger subset of the training data 
set until an acceptable quality of the estimated model is established to facilitate reducing 
cost associated with clustering data relative to the computer readable data, subsequent 
to establishing the acceptable quality of the estimated model, using the determined data 
subset to improve the estimated model parameters by employing a second training 
policy that is more accurate than the first training policy; and utilizing the estimated 
model parameters determined by utilization of the second training policy to identify a 
cluster in the computer readable data to facilitate clustering data relative to the 
computer readable data, wherein the clustered data is provided. 

Independent claim 54, as amended, recites: [a] computer implemented method to 
facilitate constructing a statistical model, comprising: . . . iteratively estimating 
statistical model parameters for a subset of the training data set over a fixed number of 
iterations and evaluating the estimated statistical model parameters relative to the 
holdout data set; repeating the estimation and evaluation of statistical model 
parameters obtained with successively larger subsets of the training data set until an 
acceptable model quality is established, acceptable model quality determined based at 
least in part on an expected incremental benefit relative to an expected incremental 
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detriment associated with an increase in size of each larger training subset of the data 
set in order to facilitate reducing cost associated with clustering data relative to the 
computer readable data; after the acceptable model quality is established, iteratively 
estimating statistical model parameters for the data subset, which provided the 
acceptable model quality, until a better quality of model is provided relative to a 
preceding estimation performed over the fixed number of iterations; and using the better 
quality model relative to the computer readable data to identify at least a cluster of data 
within the computer readable data to facilitate clustering data relative to the computer 
readable data, wherein the at least a cluster of data is provided. 

Independent claim 62, as amended, recites: [a] computer implemented method to 
facilitate constructing a statistical model, comprising: . . . iteratively estimating 
statistical model parameters for a subset of the training data set until a first 
convergence threshold is satisfied and evaluating the estimated statistical model 
parameters relative to the holdout data set; repeating the estimation and evaluation of 
statistical model parameters obtained with successively larger subsets of the training 
data set until determining a size of data subset that provides acceptable statistical 
model parameters, acceptable statistical model parameters attained where the expected 
marginal cost outweighs the expected marginal benefit associated with successively 
larger subsets in order to facilitate reducing cost associated with clustering data 
relative to the computer readable data; after determining the size of data subset that 
provides acceptable statistical model parameters, iteratively estimating statistical model 
parameters for a data subset of the acceptable size until a second convergence 
threshold is satisfied, the second convergence threshold being less than the first 
convergence threshold; and based at least in part on the estimated statistical model 
parameters identified at the second convergence threshold, identifying a good clustering 
data relative to the computer readable data to facilitate clustering data, wherein the 
clustered data is provided. 

Independent claim 63, as amended, recites: [a] computer implemented system to 
facilitate building a statistical model for a computer readable data set, comprising: first 
means for building a rough statistical model to characterize a subset of the computer 
readable data set; means for evaluating the acceptability of the rough statistical model 
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based at least in part on an expectational cost-benefit analysis to facilitate reducing 
cost associated with clustering data relative to the computer readable data set, the first 
means building another rough statistical model for a larger subset of the data set if the 
evaluation means determines that a prior rough statistical model is unacceptable; second 
means, which is different from the first means, for building a refined statistical model 
from an aggregate subset of data that yielded the rough statistical model deemed 
acceptable by the evaluation means; and means for identifying a cluster of data within 
the computer readable data set based at least in part on the refined statistical model to 
facilitate clustering data relative to the computer readable data set, wherein the 
clustered data is provided. 

Independent claim 64, as amended, recites: [a] computer implemented system to 
facilitate building a statistical model for a computer readable data set, comprising: first 
means for estimating statistical model parameters from a subset of the computer 
readable data set, the data set is statistically characterizable; means for evaluating the 
estimated statistical model parameters relative to a holdout data set of the data set; 
means for determining a data subset from the training data set by causing the first means 
and the means for evaluating to respectively repeat estimation and evaluation of statistical 
model parameters with a next successively larger subset of the training data set until an 
acceptable quality of the statistical model parameters is established, the quality of the 
statistical model parameters established when the expected cost of generating the next 
successively larger subset outweighs the expected benefit in accuracy of utilizing the 
next successively larger subset in order to facilitate reducing cost associated with 
clustering data relative to the computer readable data set; second means for estimating 
statistical model parameters based at least in part on the determined data subset to 
provide a more accurate estimation of model parameters than the first means; means for 
setting parameters associated with cluster weights of a cluster of data; and means for 
determining the cluster of data contained in the computer readable data set based at 
least in part on the more accurate estimation of statistical model parameters to 
facilitate clustering data relative to the computer readable data set, wherein the 
clustered data is provided. 
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In view of at least the foregoing, it is readily apparent that the subject claims 
produce a useful, concrete and tangible result and are directed to statutory subject matter 
in accordance with 35 U.S. C. § 101. Accordingly, withdrawal of this rejection is 
requested. 

II. Rejection of Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60 and 62-64 Under 35 
U.S.C. $ 112 

Claims 1, 3-25, 27-30, 32-42, 44-48, 50-60 and 62-64 stand rejected under 35 
U.S.C. § 1 12, first paragraph, on the grounds that current case law requires such a 
rejection if a 35 U.S.C. § 101 rejection is given. Withdrawal of this rejection is requested 
for at least the following reason. 

In view of at least the amendments to the subject claims and the reasons provided 
with regard to the rejection of the subject claims under 35 U.S.C. § 101, the specification 
contains a written description of the invention, and of the manner and process of making 
and using it, in such full, clear, concise, and exact terms as to enable any person skilled in 
the art to which it pertains, or with which it is most nearly connected, to make and use the 
same, and sets forth the best mode contemplated by the inventor of carrying out his 
invention, in accordance with 35 U.S.C. § 1 12, first paragraph. Therefore, it is 
respectfully requested that this rejection be withdrawn. 

III. Rejection of Claims 1, 19, 30, 42, and 64 Under 35 U.S.C. $ 102(b) 

Claims 1, 19, 30, 42, and 64 stand rejected under 35 U.S.C. § 102(b) as being 
anticipated by Guha et al. (US 5,140,530). Withdrawal of this rejection is requested for 
at least the following reason. Guha et al. does not disclose each and every element as set 
forth in the subject claims. 

For a prior art reference to anticipate, 35 U.S.C. §102 
requires that "each and every element as set forth in the 
claim is found, either expressly or inherently described, in a 
single prior art reference." In re Robertson, 169 F.3d 743, 
745, 49 USPQ2d 1949, 1950 (Fed. Cir. 1999) {quoting 
Verdegaal Bros., Inc. v. Union Oil Co., 814 F.2d 628, 631, 
2 USPQ2d 1051, 1053 (Fed. Cir. 1987)) (emphasis added). 
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Independent claim 1, as amended, recites: a first training method that efficiently 
builds a rough statistical model from a subset of the computer readable data set 
capable of statistical characterization; an evaluation component that evaluates the 
rough statistical model to determine whether the subset of the computer readable data 
set is an appropriate subset to be utilized to build a refined statistical model for the 
computer readable data set based at least in part on stopping criterion to facilitate 
reducing cost of clustering data relative to the computer readable data set; a second 
training method that builds the refined statistical model for the computer readable data 
set from the subset if the subset is deemed appropriate by the evaluation component, 
the refined statistical model provides a more accurate modeling of the subset than the 
rough statistical model and facilitates determining good clustering of data for a fixed 
number of clusters based at least in part on predefined accuracy criteria to facilitate 
clustering of data relative to the computer readable data set, wherein the clustered data 
is provided; and a data scheduler that, based at least in part on a data policy, adaptively 
controls the size of subsets for which the first training method is applied to facilitate 
building the refined statistical model. Guha et at. fails to disclose the distinctive aspects 
of the claimed subject matter. 

Rather, Guha et at. relates to genetic learning techniques to evolve neural network 
architectures for applications where a general representation of neural network 
architecture is linked with a genetic learning strategy creating an environment for the 
construction of custom neural networks. {See Abstract.) Guha et al. discloses cyclically 
updating a population of bit string designs for different neural networks by a genetic 
algorithm based on their fitness. {See col. 2, Ins. 62-65.) Guha et al. also discloses that 
fitness of a network is a combined measure of its worth on the problem, which make take 
into account learning speed, accuracy, and cost factors such as size and complexity of the 
networks. {See col. 2, In. 65 - col. 3, In. 2.) 

However, unlike the claimed subject matter, Guha et al. is silent with respect to 
construction, from a data set, of a statistical model and refining the statistical model and 
utilizing the refined model to cluster the data set. Also, Guha et al. fails to disclose 
regarding utilizing a first training method to quickly and efficiently developing a rough 
statistical model associated with a data set, which can then be utilized to facilitate 
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building a refined statistical model (e.g., more accurate statistical model) based in part on 
a second training method. Furthermore, Guha et at. is silent regarding clustering data 
utilizing the refined statistical model. 

In contrast, the claimed subject matter can facilitate characterizing and/or 
clustering a set of data (e.g., computer readable data set). The claimed subject matter can 
employ a first training method or protocol to facilitate developing a rough statistical 
model, which can be associated with a suitable set of training data, in an efficient manner. 
The claimed subject matter can evaluate the rough statistical model to determine whether 
the rough statistical model meets predefined stopping criterion. For instance, the claimed 
subject matter can perform one or more iterations to build the rough statistical model, 
where for each additional iteration (if any), additional data can be added to the subset of 
training data to facilitate building an acceptable rough statistical model. The claimed 
subject matter can also employ a second training method or protocol that can facilitate 
building a refined statistical model that can be more accurate than the rough statistical 
model. The refined statistical model can be utilized to facilitate characterizing and/or 
clustering data relative to the set of data, where the characterizing and/or clustering the 
data can yield desirable (e.g., good) clusters due in part to the predefined accuracy 
criteria employed when building the refined statistical model. Further, the claimed 
subject matter can characterize and/or cluster data in a time-efficient and 
computationally-efficient manner, as compared to conventional systems or methods.. 

Also, independent claim 19, as amended, recites: a first parameter estimation 
protocol that efficiently builds a rough statistical model from a subset of a computer 
readable data set based at least in part on a training policy associated therewith, the 
computer readable data set is statistically characterizable; an evaluation component that 
determines whether the subset of data from which the rough statistical model was built 
is an acceptable size for building the statistical model to characterize the data set, the 
evaluation component utilizes a stopping criterion that is functionally related to an 
expected incremental benefit and an expected incremental cost associated with 
increasing the size of the subset of data to facilitate determining whether the rough 
statistical model is an acceptable size and to facilitate reducing cost of clustering data 
relative to the computer readable data set; and a second parameter estimation protocol 
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that builds a refined statistical model for the data set from the subset if determined to 
have the acceptable size, the second parameter estimation protocol having an associated 
training policy, which enables the second parameter estimation protocol to build the 
refined statistical model to be a more accurate statistical model than the first parameter 
estimation protocol, the refined statistical model employed to identify clusters of data 
within the computer readable data set to facilitate clustering data relative to the 
computer readable data set, wherein the clustered data is provided. 

For at least the reasons stated herein with regard to independent claim 1 , Guha et 
at. fails to disclose the distinctive aspects of the claimed subject matter as recited in 
independent claim 19. In addition, unlike the claimed subject matter, Guha et at. is silent 
regarding employing a stopping criterion that is based on an expected incremental benefit 
and an expected incremental cost associated with increasing the size of the data subset to 
facilitate determining whether a rough statistical model is an acceptable size. 

Independent claim 30 (and similarly independent claims 42 and 64), as 
amended, recites: employing a first training method to build a rough statistical model to 
characterize the subset; evaluating the rough statistical model for acceptability, if the 
rough statistical model is unacceptable, repeatedly increasing the size of the subset of 
data to provide an aggregate data set, building another rough statistical model to 
characterize the aggregate subset, and reevaluating the other rough statistical model, the 
acceptability of each rough statistical model based at least in part on a stopping criterion 
functionally related to an expected incremental benefit and an expected incremental cost 
associated with increasing the size of the aggregate subset in order to facilitate reducing 
cost associated with clustering data relative to the computer readable data set; and if the 
rough statistical model is acceptable, employing a second training method to build a 
refined statistical model based at least in part on the aggregate data set, the second 
training method being different from the first training method, the refined statistical 
model identifies data clusters contained in the computer readable data set to facilitate 
clustering of data relative to the computer readable data set, wherein the clustered data 
is provided. For at least the reasons stated herein with regard to independent claims 1 
and 19, Guha et al. fails to disclose the distinctive aspects of the claimed subject matter. 

Furthermore, independent claim 64, as amended, in part, additionally recites: 
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means for setting parameters associated with cluster weights of a cluster of data. Guha 
et at. fails to disclose the distinctive feature of the claimed subject matter. 

Instead, Guha et al. discloses applying training-input to the input of the network, 
modifying network weights if a desired output is not achieved. {See col. 3, Ins. 29-33.) 
Guha et al. fails to disclose clustering data let alone setting parameters associated with 
cluster weights of a cluster of data. 

Moreover, the Examiner does not contend that Guha et al. discloses the claimed 
subject matter as recited in claims 3-18, 20-25, 27-29, 32-41, 44-48, 50-60, 62, and 63. 
Guha et al. fails to disclose the claimed subject matter as recited in claims 3-18, 20-25, 
27-29, 32-41, 44-48, 50-60, 62, and 63. Rather, Guha et al. relates to genetic learning 
techniques to evolve neural network architectures for applications where a general 
representation of neural network architecture is linked with a genetic learning strategy 
creating an environment for the construction of custom neural networks. {See Abstract.) 

In view of at least the foregoing, Guha et al. fails to disclose each and every 
element of the claimed subject matter as recited in independent claims 1,19, 30, 42, and 
64 (as well as independent claims 44, 53, 54, 62, and 63, and dependent claims 3-18, 20- 
25, 27-29, 32-41, 45-48, 50-52, and 55-60). Therefore, the subject claims are in 
condition for allowance and the rejection should be withdrawn. 
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Conclusion 

The present application is believed to be in condition for allowance in view of the 
above comments and amendments. A prompt action to such end is earnestly solicited. 

In the event any fees are due in connection with this document, the Commissioner 
is authorized to charge those fees to Deposit Account No. 50-1063 [MSFTP184US]. 

Should the Examiner believe a telephone interview would be helpful to expedite 
favorable prosecution, the Examiner is invited to contact applicants' undersigned 
representative at the telephone number below. 

Respectfully submitted, 
Amin, Turocy & Calvin, llp 

/Himanshu S. Amin/ 

Himanshu S. Amin 
Reg. No. 40,894 



Amin, Turocy & Calvin, llp 
24 th Floor, National City Center 
1900 E. 9 th Street 
Cleveland, Ohio 44114 
Telephone (216) 696-8730 
Facsimile (216) 696-8731 
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